Towards the Annotation of Penn TreeBank with Information Structure

نویسندگان

  • Bernd Bohnet
  • Alicia Burga
  • Leo Wanner
چکیده

Information Structure (IS) determines the “communicative” segmentation of the meaning of an utterance, which makes it central to the semantics–syntax– intonation interface and therefore also to NLP. Despite this relevance, IS has not received much attention in the context of the majority of the reference treebanks for data-driven NLP that already contain a semantic and syntactic layers of annotation. We present our work in progress on the annotation of the Penn TreeBank with the thematicity dimension of the IS as defined in the Meaning-Text Theory. We experiment with tagging and transitionbased parsing techniques. Especially the latter achieve acceptable accuracy with even very small training samples, which is promising for languages with scarce resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards interoperable discourse annotation. Discourse features in the Ontologies of Linguistic Annotation

This paper describes the extension of the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in the most important corpora available to the community, including OntoNotes, the RST Discourse...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Improving the complement/adjunct distinction in CCGbank

One of the challenges of adapting the Penn Treebank for a specific formalism is that the target annotation often requires information represented imperfectly or not at all in the original corpus. When this occurs, the information must either be guessed with heuristics, or annotated manually. Recently, a third option has become available, due to the release of resources that supplement the Penn ...

متن کامل

Towards Interoperability for the Penn Discourse Treebank

The recent proliferation of diverse types of linguistically annotated schemes coded in different representation formats has led to efforts to make annotations interoperable, so that they can be effectively used towards empirical NL research. We have rendered the Penn Discourse Treebank (PDTB) annotation scheme in an abstract syntax following a formal generalized annotation scheme methodology, t...

متن کامل

Developing An Arabic Treebank: Methods, Guidelines, Procedures, And Tools

In this paper we address the following questions from our experience of the last two and a half years in developing a large-scale corpus of Arabic text annotated for morphological information, part-of-speech, English gloss, and syntactic structure: (a) How did we ‘leapfrog’ through the stumbling blocks of both methodology and training in setting up the Penn Arabic Treebank (ATB) annotation? (b)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013